Assuring Agent Safety Evaluations By Analysing Transcripts
lesswrong.com·9h
Effect Inference
From CAP to GAP?
fsharpforfunandprofit.com·12h·
Discuss: DEV
🧪Testing Compilers
IASC: Interactive Agentic System for ConLangs
arxiv.org·15h
🔄Incremental Lexing
Show HN: Comparegpt.io – Trustworthy Mode to reduce LLM hallucinations
news.ycombinator.com·18h·
Discuss: Hacker News
🧪Parser Testing
I built a translator for spatial thinking (because I can't interview in Python)
graemefawcett.ca·11m·
Discuss: Hacker News
🎮Language Ergonomics
Slip – A Lisp System in JavaScript
lisperator.net·5h·
Discuss: Hacker News
🌱Minimal Lisps
Analyzing Dialectical Biases in LLMs for Knowledge and Reasoning Benchmarks
machinelearning.apple.com·1d
📊LR Parsing
Three ways formally verified code can go wrong in practice
buttondown.com·2h
📜Proof Languages
A small number of samples can poison LLMs of any size
dev.to·17h·
Discuss: DEV
🗺️Region Inference
Three Solutions to Nondeterminism in AI
blog.hellas.ai·2d·
Discuss: Hacker News
Type Checking
Let's Write a Macro in Rust
hackeryarn.com·3h·
Discuss: Hacker News
🦀Rust Macros
Show HN: Realization Jsmn on a Pure Zig
github.com·9h·
Discuss: Hacker News
📋JSON Parsing
Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking
arxiv.org·4d
🧪Parser Testing
ReasonScape Evaluation: AI21 Jamba Reasoning vs Qwen3 4B vs Qwen3 4B 2507
reddit.com·1d·
Discuss: r/LocalLLaMA
🏁Language Benchmarks
LangChain.js is overrated; Build your AI agent with a simple fetch call
blog.logrocket.com·1d
🚂Cranelift Backend
Stress-Testing Model Specs Reveals Character Differences among Language Models
arxiv.org·15h
📋Backus-Naur Form
How static analysis encourages developers to refactor code: Another look at Source SDK
dev.to·4h·
Discuss: DEV
🪄C Metaprogramming
Show HN: Using an LLM to sensibly sort a shopping receipt
treblig.org·1d·
Discuss: Hacker News
🪢Rope Algorithms
The RAG Playbook: A Data Science Guide to Document Chunking
pub.towardsai.net·2h
🌱Minimal ML